随着向设备深度学习的转变,确保在各种计算平台上的AI服务的一致行为变得非常重要。我们的工作解决了降低视力倒数的预测不一致的新兴问题:由较不准确的模型正确预测但错误地预测的测试样品。我们介绍了回归约束的神经体系结构搜索(Reg-NAS),以设计一个高度准确的模型家庭,这些模型会导致更少的负面流动。 Reg-NAS由两个组成部分组成:(1)一种新型的体系结构约束,使较大的模型能够包含较小的权重,从而最大化权重共享。这一想法源于我们的观察结果,即网络之间的重量较大会导致相似的样本预测,并导致负面量较少。 (2)一种新颖的搜索奖励,在体系结构搜索指标中同时结合了TOP-1的准确性和负面翻转。我们证明,\ regnas可以在三个流行的架构搜索空间中成功找到具有很少负面额的理想体系结构。与现有的最新方法相比,Reg-NAS可实现33-48%的负面流量相对减少。
translated by 谷歌翻译
随着基于图像的二进制表示,计算机视觉在自动化恶意软件检测中起着越来越重要的作用。这些二进制图像快速生成,不需要功能工程,并且对流行的混淆方法有弹性。但是,在该领域进行了重大研究,但是它仅限于仅少数行业实验室和研究团队可以使用的小型或私人数据集。这种缺乏可用性阻碍了对现有工作的研究,新研究的发展以及思想的传播。我们发布了最大的公共网络安全图像数据库Malnet-Image,比现有数据库多24倍的图像和70倍的类别(可在https://mal-net.org上获得)。 Malnet-图像包含超过120万张恶意软件图像 - 遍及47种类型和696个家庭 - 通过使研究人员和实践者能够评估以前在礼节环境中报道的技术来使基于图像的恶意软件能力民主化。我们在二进制图像上报告了第一个百万尺度的恶意软件检测结果。 MALNET-IMAGE解锁了新的独特机会,可以推进机器学习的前沿,从而使新的研究方向可以进入基于视觉的网络防御,多类不平衡分类和可解释的安全性。
translated by 谷歌翻译
Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.
translated by 谷歌翻译
Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ''The output depends only on a small (but unknown) segment of the input.'' In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network. We make such a notion more precise for a variant of the classification problem that we term selective dependence classification (SDC) when used with attention model architectures. Under such a setting, we demonstrate various error modes where an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We illustrate various situations that can accentuate and mitigate this behaviour. Finally, we use our objective definition of interpretability for SDC tasks to evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.
translated by 谷歌翻译
Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and developing tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions.
translated by 谷歌翻译
Radiance Fields (RF) are popular to represent casually-captured scenes for new view generation and have been used for applications beyond it. Understanding and manipulating scenes represented as RFs have to naturally follow to facilitate mixed reality on personal spaces. Semantic segmentation of objects in the 3D scene is an important step for that. Prior segmentation efforts using feature distillation show promise but don't scale to complex objects with diverse appearance. We present a framework to interactively segment objects with fine structure. Nearest neighbor feature matching identifies high-confidence regions of the objects using distilled features. Bilateral filtering in a joint spatio-semantic space grows the region to recover accurate segmentation. We show state-of-the-art results of segmenting objects from RFs and compositing them to another scene, changing appearance, etc., moving closer to rich scene manipulation and understanding. Project Page: https://rahul-goel.github.io/isrf/
translated by 谷歌翻译
Reduced system dependability and higher maintenance costs may be the consequence of poor electric power quality, which can disturb normal equipment performance, speed up aging, and even cause outright failures. This study implements and tests a prototype of an Online Sequential Extreme Learning Machine (OS-ELM) classifier based on wavelets for detecting power quality problems under transient conditions. In order to create the classifier, the OSELM-network model and the discrete wavelet transform (DWT) method are combined. First, discrete wavelet transform (DWT) multi-resolution analysis (MRA) was used to extract characteristics of the distorted signal at various resolutions. The OSELM then sorts the retrieved data by transient duration and energy features to determine the kind of disturbance. The suggested approach requires less memory space and processing time since it can minimize a large quantity of the distorted signal's characteristics without changing the signal's original quality. Several types of transient events were used to demonstrate the classifier's ability to detect and categorize various types of power disturbances, including sags, swells, momentary interruptions, oscillatory transients, harmonics, notches, spikes, flickers, sag swell, sag mi, sag harm, swell trans, sag spike, and swell spike.
translated by 谷歌翻译
Arbitrary Style Transfer is a technique used to produce a new image from two images: a content image, and a style image. The newly produced image is unseen and is generated from the algorithm itself. Balancing the structure and style components has been the major challenge that other state-of-the-art algorithms have tried to solve. Despite all the efforts, it's still a major challenge to apply the artistic style that was originally created on top of the structure of the content image while maintaining consistency. In this work, we solved these problems by using a Deep Learning approach using Convolutional Neural Networks. Our implementation will first extract foreground from the background using the pre-trained Detectron 2 model from the content image, and then apply the Arbitrary Style Transfer technique that is used in SANet. Once we have the two styled images, we will stitch the two chunks of images after the process of style transfer for the complete end piece.
translated by 谷歌翻译
Machine learning (ML) algorithms are remarkably good at approximating complex non-linear relationships. Most ML training processes, however, are designed to deliver ML tools with good average performance, but do not offer any guarantees about their worst-case estimation error. For safety-critical systems such as power systems, this places a major barrier for their adoption. So far, approaches could determine the worst-case violations of only trained ML algorithms. To the best of our knowledge, this is the first paper to introduce a neural network training procedure designed to achieve both a good average performance and minimum worst-case violations. Using the Optimal Power Flow (OPF) problem as a guiding application, our approach (i) introduces a framework that reduces the worst-case generation constraint violations during training, incorporating them as a differentiable optimization layer; and (ii) presents a neural network sequential learning architecture to significantly accelerate it. We demonstrate the proposed architecture on four different test systems ranging from 39 buses to 162 buses, for both AC-OPF and DC-OPF applications.
translated by 谷歌翻译
Temporal action segmentation tags action labels for every frame in an input untrimmed video containing multiple actions in a sequence. For the task of temporal action segmentation, we propose an encoder-decoder-style architecture named C2F-TCN featuring a "coarse-to-fine" ensemble of decoder outputs. The C2F-TCN framework is enhanced with a novel model agnostic temporal feature augmentation strategy formed by the computationally inexpensive strategy of the stochastic max-pooling of segments. It produces more accurate and well-calibrated supervised results on three benchmark action segmentation datasets. We show that the architecture is flexible for both supervised and representation learning. In line with this, we present a novel unsupervised way to learn frame-wise representation from C2F-TCN. Our unsupervised learning approach hinges on the clustering capabilities of the input features and the formation of multi-resolution features from the decoder's implicit structure. Further, we provide the first semi-supervised temporal action segmentation results by merging representation learning with conventional supervised learning. Our semi-supervised learning scheme, called ``Iterative-Contrastive-Classify (ICC)'', progressively improves in performance with more labeled data. The ICC semi-supervised learning in C2F-TCN, with 40% labeled videos, performs similar to fully supervised counterparts.
translated by 谷歌翻译